Determining a rating for a collection of documents

ABSTRACT

On one or more data processing systems, a collection rating is determined for a rating scale for contents of a document collection. A link rating is determined for the rating scale for contents linked to or linked by contents of the document collection. The collection rating for the rating scale for contents of the document collection is then modified, based on the determined link rating for the rating scale for contents linked to or linked by contents of the document collection.

[0001] This application claims priority to provisional application Nos.60/289,587, 60/289,400 and 60/289,418, all filed on May 7, 2001,entitled “Method of Assigning Ratings to Collections of RelatedObjects”, “Method and Apparatus for Automatically Determining SalientFeatures for Object Classification” and “Vvery-Large-Scale AutomaticCategorizer For Web Content” respectively having at least partial commoninventorship as the present application.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to the field of data processing.More specifically, the present invention relates to automated methodsand systems for determining a rating for a rating scale for a collectionof documents.

[0004] 2. Background Information

[0005] The World Wide Web (WWW) is an expanding collection of textualand non-textual material which is available for access to any Internetuser, from any location at any time. Some users find particular contentsto be objectionable. For example, parents often wish to shield theirchildren from exposure to sexually explicit material, hate speech, anddrug information. Similarly, companies may wish to prevent access byemployees to web sites that provide or support gambling.

[0006] Notwithstanding the civil liberty implications associated withthese concerns, a number of groups and companies have brought forwardsystems and techniques for assisting Internet users in block accessingto undesired content. For example, various blocking software productsare available from software vendors, such as SafeSurf of Newbury Park,Calif., and NetNanny of Bellevue, Wash. Typically, these products employsite lists to effectuate blocking of access to undesired contents. Thesesite lists include the identifications of the web sites containingundesired contents. Access to any of the web pages hosted by theidentified web sites is blocked. Another example of such a system isdescribed by Neilsen et al., “Selective downloading of file typescontained in hypertext documents transmitted in a computer controllednetwork”, U.S. Pat. No. 6,098,102, which utilizes the file extensions ofURLs to determine whether the particular files will or will not bedownloaded to the user. Still another method for controlling access toweb sites is typified by the work of the Internet Content RatingAssociation, which uses the technology of the Platform for InternetContent Selection (PICS) specification to allow voluntary, or in thefuture potentially mandatory, rating of page content by the contentauthor. Filtering can then be done by utilizing these rating “tags”, andmay be augmented by a complete block on other un-rated pages.

[0007] These prior art approaches suffer from at least the followingdisadvantages:

[0008] a) The WWW is constantly growing. The number of web sites andtheir contents are constantly changing. As a result, the prior artapproaches are unable to keep pace with the changes.

[0009] b) Further, many web sites generate user-specific pages at everyaccess. As a result, the prior art URL based approaches are unable tofacilitate blocking of these dynamically generated pages if they containundesired contents.

[0010] c) Additionally, content providers are often not the best, oreven the appropriate, agent for rating their own contents. Duplicitousproviders may deliberately mis-rate the appropriateness of theircontents.

[0011] Some filtering systems rely on key word lists or text analysis,to judge the content of individual pages. While these systems may worksatisfactorily on text files, they are ineffective for non-textmaterials, such as images, sound files, or movies.

[0012] Thus, an improved approach for blocking undesired contents isdesired.

SUMMARY OF THE INVENTION

[0013] On one or more data processing systems, a collection rating isdetermined for a rating scale for contents of a document collection. Alink rating is determined for the rating scale for contents linked to orlinked by contents of the document collection. The collection rating forthe rating scale for contents of the document collection is thenmodified, based on the determined link rating for the rating scale forcontents linked to or linked by contents of the document collection.

[0014] In one embodiment, a collection rating for a rating scale for adocument collection is determined based on document ratings of a subsetof the documents of the document collection, and their sizes.

[0015] In one embodiment, the link rating for the rating scale for thedocument collection is determined based on the collection ratings of thedocument collections having contents linked to or linked by contents ofthe document collection.

[0016] In one embodiment, the document collection is a web site, thedocuments of the document collection are web pages of the web site, andthe subset of documents employed to determine the web site rating is thetextual documents.

[0017] Note: The term “document” as used herein in this application,including the specification and the claims, includes textual as well asnon-textual documents, unless one or more types of “documents” areexpressly excluded or implicitly excluded in view of the context of theusage.

BRIEF DESCRIPTION OF DRAWINGS

[0018] The present invention will be described by way of exemplaryembodiments, but not limitations, illustrated in the accompanyingdrawings in which like references denote similar elements, and in which:

[0019]FIG. 1 illustrates an overview of the present invention inaccordance with one embodiment;

[0020]FIG. 2 illustrates a method view of the present invention, inaccordance with one embodiment;

[0021]FIG. 3 illustrates the operational flow for determining acollection rating, in accordance with one embodiment;

[0022]FIG. 4 illustrates the operational flow for determining a linkrating, in accordance with one embodiment; and

[0023]FIG. 5 illustrates a computer system suitable for use to practicethe present invention, in accordance with one embodiment.

[0024] Glossary

[0025] URL—Uniform Resource Locator

DETAILED DESCRIPTION OF THE INVENTION

[0026] As summarized earlier, the present invention includes improvedmethods and related apparatuses for determining a rating for a ratingscale for a document collection. In the description to follow, variousaspects of the present invention will be described. However, the presentinvention may be practiced with only some or all aspects of the presentinvention. For purposes of explanation, specific numbers, materials andconfigurations are set forth in order to provide a thoroughunderstanding of the present invention. However, the present inventionmay be practiced without some of the specific details. In otherinstances, well known features are omitted or simplified in order not toobscure the present invention.

[0027] Parts of the description will be presented in terms of operationsperformed by a processor based device, using terms such as data,analyzing, assigning, selecting, determining, and the like, consistentwith the manner commonly employed by those skilled in the art to conveythe substance of their work to others skilled in the art. As wellunderstood by those skilled in the art, the quantities take the form ofelectrical, magnetic, or optical signals capable of being stored,transferred, combined, and otherwise manipulated through mechanical andelectrical components of the processor based device. The term“processor” includes microprocessors, micro-controllers, digital signalprocessors, and the like, that are standalone, adjunct or embedded.

[0028] Various operations will be described as multiple discrete stepsin turn, in a manner that is most helpful in understanding the presentinvention. However, the order of description should not be construed asto imply that these operations are necessarily order dependent. Inparticular, these operations need not be performed in the order ofpresentation. Further, the description repeatedly uses the phrase “inone embodiment”, which ordinarily does not refer to the same embodiment,although it may.

Overview

[0029] Referring now to FIG. 1, wherein a block diagram illustrating anoverview of the present invention, in accordance with one embodiment, isshown. As illustrated, collection rater 110 of the present invention, isequipped to deduce a collection rating 112 for a rating scale for adocument collection, such as collection 102. An example of a ratingscale is a scale that quantitatively rates the contents of a subjectcollection on its “offensiveness”, e.g. ranging from 0 to 3, with 0meaning “not offensive”, 1 meaning “mildly offensive”, 2 meaning“moderately offensive” and 3 meaning “very offensive”. As will bedescribed in more detail below, collection rater 110 advantageouslygenerates collection rating 112 for a collection taking in account notonly the contents of the collection, but also contents of othercollections linked to or linked by contents of the subject collection,such as collection 104 and collection 106 respectively. As those skilledin the art would appreciate, the inclusion of the contents linked to orlinked by contents of the subject collection tends to strengthen theaccuracy of the rating generated for the subject collection.

[0030] In one embodiment, collections 102, 104 and 106 are web sites,and documents 103, 105 and 107 are web pages of the web sites, includingtextual as well as non-textual, such as multi-media, web pages. Inalternate embodiments, documents 103, 105 and 107 may be other contentobjects, with collections 102, 104 and 106 being other organizationalentities of the content objects.

Method

[0031] Referring now to FIG. 2, wherein a block diagram illustrating amethod view of the present invention, in accordance with one embodiment,is shown. As illustrated, for the embodiment, collection rater 110generates a collection rating for rating scale for a subject collection,by first determining an initial collection rating for the contents ofthe subject collection, block 202. Upon so determining, collection rater110 determines a link rating for the contents of the linked collections,i.e. collections with contents linked to or linked by contents of thesubject collection, block 204. Thereafter, for the illustratedembodiment, collection rater 110 modifies the initially determinedcollection rating, using the determined link rating, thereby taking intoconsideration the “linked” contents, block 206.

[0032] In one embodiment, in block 206, collection rater 110 modifiesthe initially determined collection rating by replacing the initiallydetermined collection rating with the determined link rating. In anotherembodiment, in block 206, collection rater 110 modifies the initiallydetermined collection rating by adding the determined link rating to theinitially determined collection rating. In yet another embodiment, inblock 206, collection rater 110 modifies the initially determinedcollection rating by subtracting the determined link rating from theinitially determined collection rating. In yet other embodiments, inblock 206, collection rater 110 may modify the initially determinedcollection rating by combining the determined link rating with theinitially determined collection rating in other alternate manners.

[0033] The manner in which the determined link rating is to be combinedwith the initially determined collection rating to modify the initiallydetermined collection rating to take into account the linked contents isapplication dependent. Preferably, the manner of combination is userconfigurable. Such user configuration may be facilitated through any oneof a number of user configuration techniques known in the art, which areall within the abilities of those ordinarily skilled in the art.Accordingly, no further description of these user configurationtechniques is necessary.

Collection Rating

[0034] Referring now to FIG. 3, wherein a block diagram illustrating amanner collection rater 110 generates a collection rating for a ratingscale for a subject collection, in accordance with one embodiment, isshown. As illustrated, for the embodiment, collection rater 110generates the collection rating for a rating scale for a subjectcollection by first determining the individual document ratings for asubset of the documents of the subject collection, block 302. In oneembodiment, the subject collection comprises textual as well asnon-textual, such as multi-media, documents. For the embodiment, thesubset of the documents is the textual documents. The determination ofthe individual document ratings for the textual documents may be made inaccordance with any one of a number of document rating techniques, e.g.by the salient features or keywords of each of the document. Examples ofthese document rating techniques include but are not limited to thosedescribed in U.S. Provisional Applications Nos. 60/289,400 and60/289,418, entitled “METHOD AND APPARATUS FOR AUTOMATICALLY DETERMININGSALIENT FEATURES FOR OBJECT CLASSIFICATION” and “VERY-LARGE-SCALEAUTOMATIC CATEGORIZER FOR WEB CONTENT” respectively, both filed on May7, 2001. Both applications are hereby fully incorporated by reference.

[0035] In accordance with the present invention, in addition todetermining the individual document ratings of the subset of thedocuments, collection rater 110 further determines the sizes of thedocuments, block 304. Then, collection rater 110 determines thecollection rating by combining the determined individual documentratings in a size and rating normalized manner, block 306.

[0036] More specifically, in one embodiment, collection rater 110combines the determined individual document ratings in a size and ratingnormalized manner, by grouping the documents in accordance with theirdetermined sizes and determined ratings, and applying weights to thedetermined document ratings in accordance with their size group andrating group membership. In one embodiment, the weights are applied inaccordance with the group sizes and determined ratings as set forth bythe tables below: Document size range in (bytes) Weight <500  1 500-9994 1000-4999 7 5000-9999 10 >9999 13 Determined document rating for saidrating scale Weight    0 −0.5    1 0.5    2 3    3 6

[0037] The weights are applied in accordance with the formula set forthbelow:${CR} = \frac{\sum\limits_{i,j}{r_{i}w_{j}{\log \left( {N_{ij} + 1} \right)}}}{\sum\limits_{i,j}{w_{j}{\log \left( {N_{ij} + 1} \right)}}}$

[0038] where CR is the collection rating for the rating scale;

[0039] r_(i) is the weight applied for document rating group i;

[0040] w_(i) is the weight applied for document size group j;

[0041] N_(ij) is the number of pages in the collection with documentrating i and having group sizes j for the rating scale.

[0042] In alternate embodiments, for different rating scales, differentrating and/or group size based weighting schemes, as well as otherweighting schemes may be employed instead.

Link Rating

[0043] Referring now to FIG. 4, wherein a block diagram illustrating amanner collection rater 110 generates a link rating for a rating scalefor a subject collection, in accordance with one embodiment, is shown.As illustrated, for the embodiment, collection rater 110 generates thelink rating for a rating scale for a subject collection by firstgenerating the collection ratings for the collections having contentseither linked to or linked by contents of the subject collection, block402. The collection rating for the rating scale for each of thecollection with contents either linked to or linked by contents of thesubject collection, may be generated in the same manner the collectionrating for the rating scale for the subject collection is generated,e.g. as earlier described, or in a different manner.

[0044] Upon so determining, for the illustrated embodiment, collectionrater 110 sums the determined collection ratings for the rating scalefor the other collections, block 404, then generates the link ratingbased on the resulting sum, block 406. In one embodiment, collectionrater 110 generates the link rating based on the resulting sum inaccordance with the discrete “step” function set forth below: Theresulting sum (RS) link rating RS less than −2 −1.0   RS greater than orequal to −2, −0.5   but less than −1 RS greater than or equal to −1, 0  but less than or equal to −0.5 RS greater than −0.5, but less 0.5 thanor equal to 1.5 RS greater than 1.5, but less 1.0 than or equal to 3 RSgreater than 3, but less than 1.5 than or equal to 4 RS greater than 42.0

[0045] In alternate embodiments, the link rating may be generated fromthe determined collection ratings of the “linked” collections employingdifferent functions.

[0046] Accordingly, under the present invention, “linked” contents aretaken into consideration to potentially strengthen the accuracy of therating generated for a rating scale for a subject collection. As thoseskilled in the art would appreciate, the present invention may bepracticed for one or more rating scales on one or more subjectcollections, each having zero or more “linked” collections. Subjectcollections with zero “linked” collection is merely a degenerate casewhere no “linked” content contribution can be extracted to potentiallystrengthen the accuracy of the ratings generated for the rating scalesfor the subject collections.

Example Computer System

[0047]FIG. 5 illustrates an exemplary computer system 500 suitable foruse to practice the present invention, in accordance with oneembodiment. As shown, computer system 500 includes one or moreprocessors 502 and system memory 504. Additionally, computer system 500includes one or more mass storage devices 506 (such as diskette, harddrive, CDROM and so forth), one or more input/output devices 508 (suchas keyboard, cursor control and so forth) and communication interfaces510 (such as network interface cards, modems and so forth). The elementsare coupled to each other via system bus 512, which represents one ormore buses. In the case of multiple buses, they are bridged by one ormore bus bridges (not shown). Each of these elements performs itsconventional functions known in the art. In particular, system memory504 and mass storage 506 are employed to store a working copy (514 a)and a permanent copy (514 b) of the programming instructionsimplementing the teachings of the present invention (collectioncategorizer). The permanent copy (514 b) of the programming instructionsmay be loaded into mass storage 506 in the factory, or in the field, asdescribed earlier, through a distribution medium (not shown) or throughcommunication interface 510 (from a distribution server (not shown)).The constitution of these elements 502-512 are known, and accordinglywill not be further described.

[0048] In alternate embodiments, the present invention may be practiceon multiple systems sharing common and/or networked storage.

Modifications and Alterations

[0049] While the present invention has been described referencing theillustrated and above enumerated embodiments, the present invention isnot limited to these described embodiments. Numerous modification andalterations may be made, consistent with the scope of the presentinvention as set forth in the claims to follow. Of course, the aboveexamples are merely illustrative. Based on the above descriptions, manyother equivalent variations will be appreciated by those skilled in theart.

Conclusion and Epilogue

[0050] Thus, a method and apparatus for generating a collection ratingfor a document collection comprising textual and non-textual documents,has been described. Since as illustrated earlier, the present inventionmay be practiced with modification and alteration within the spirit andscope of the appended claims, the description is to be regarded asillustrative, instead of being restrictive on the present invention.

What is claimed is:
 1. A method of operation on one or more dataprocessing machines, the method comprising: determining a firstcollection rating for a first rating scale for contents of a firstdocument collection; determining a first link rating for said firstrating scale for contents linked to or linked by contents of said firstdocument collection; and modifying said first collection rating for saidfirst rating scale for contents of said first document collection basedon said determined first link rating for said first rating scale forcontents linked to or linked by contents of said first documentcollection.
 2. The method of claim 1, wherein said determining of afirst collection rating comprises determining said first collectionrating based on document ratings of a first subset of documents of saidfirst collection of documents, and sizes of the documents of the firstsubset of documents of the first document collection.
 3. The method ofclaim 2, wherein said first subset of documents of said first documentcollection consists of first textual documents of said first documentcollection.
 4. The method of claim 1, wherein said determining of afirst link rating comprises determining at least a second collectionrating for at least a second document collection with documents linkedto or linked by documents of said first document collection, anddetermining said first link rating based on said determined at least asecond collection rating of said at least a second document collection.5. The method of claim 1, wherein said modifying of the first collectionrating comprises replacing the determined first collection rating withsaid determined first link rating.
 6. The method of claim 1, whereinsaid modifying of the first collection rating comprises adding saiddetermined first link rating to the determined first collection rating.7. The method of claim 1, wherein said modifying of the first collectionrating comprises subtracting said determined first link rating from thedetermined first collection rating.
 8. The method of claim 1, whereinsaid first document collection is a web site, and said contents of saidfirst document collection are web pages.
 9. A method of operation on oneor more data processing machines, the method comprising: determiningdocument ratings for a rating scale for a subset of documents of adocument collection; determining sizes of the documents of said subset;determining a collection rating for said rating scale for said documentcollection based on said determined document ratings of said subset ofdocuments, and normalized by said determined sizes of said subset ofdocuments.
 10. The method of claim 9, wherein said determining of thecollection rating comprises further subdividing said subset of documentsinto a plurality of groups in accordance with their determined sizes,and applying a weight to the document rating determined for said ratingscale for each document of the subset in accordance to the document'ssize group classification.
 11. The method of claim 10, wherein weightsare applied to said determined document ratings for said rating scale asfollows: Document size range in (bytes) Weight <500  1 500-999  41000-4999  7 5000-9999 10 >9999  13


12. The method of claim 9, wherein said determining of the collectionrating comprises further subdividing said subset of documents into aplurality of groups in accordance with their determined ratings for saidrating scale, and applying a weight to the document rating determinedfor said rating scale for each document of the subset in accordance tothe document's rate group classification.
 13. The method of claim 12,wherein weights are applied to said determined document ratings for saidrating scale as follows: Determined document rating for said ratingscale Weight 0 −0.5 1 0.5 2 3 3 6


14. The method of claim 9, wherein said determining of the collectionrating comprises computing the collection rating for said rating scaleas follows:${CR} = \frac{\sum\limits_{i,j}{r_{i}w_{j}{\log \left( {N_{ij} + 1} \right)}}}{\sum\limits_{i,j}{w_{j}{\log \left( {N_{ij} + 1} \right)}}}$

where CR is the collection rating for said rating scale; r_(i) is theweight applied for document rating group i; w_(i) is the weight appliedfor document size group j; N_(ij) is the number of pages in thecollection with document rating i and having group sizes j for saidrating scale.
 15. The method of claim 9, wherein said first collectionof documents are web pages of a web site, and said first subset ofdocuments are textual documents of said web site.
 16. A method ofoperation on one or more data processing machines, the methodcomprising: determining whether a first document collection comprises atleast one document linked to at least one other document of at least oneother second document collection; determining a collection rating for arating scale for each of said at least one other second documentcollection if said first document collection is determined to compriseat least one document linked to at least one other document of at leastone other second document collection; determining whether said firstdocument collection comprises at least one document being linked by atleast one other document of at least one other third documentcollection; determining a collection rating for said rating scale foreach of said at least one other third document collection if said firstdocument collection is determined to comprise at least one documentlinked by at least one other third document collection; and determininga link rating for said rating scale for said first document collectionbased on either said determined collection rating or ratings for saidrating scale for said at least one other second document collection, orsaid determined collection rating or ratings for said rating scale forsaid at least one other third document collection, or both, depending onwhether collection rating or ratings are determined for said ratingscale for said at least one other second document collection, said atleast one other third document collection or both.
 17. The method ofclaim 16, wherein each of said determining of a collection rating forsaid rating scale for each of said at least one other second or thirddocument collection comprises determining document ratings for saidrating scale for documents of the particular document collection, andsizes of the documents, and determining the collection rating for theparticular document collection based on the determined document ratingsand the determined sizes.
 18. The method of claim 16, wherein saiddetermining of a link rating comprises summing said collection rating orratings determined for said rating scale for said at least one othersecond or third document collection, and determining the link ratingbased on the result of said summing.
 19. The method of claim 18, whereinsaid determining of the link rating based on the result of said summingcomprises determining the link rating based on the result of saidsumming as follows: The result of said summing (RS) link rating RS lessthan −2 −1.0   RS greater than or equal to −2, −0.5   but less than −1RS greater than or equal to −1, 0   but less than or equal to −0.5 RSgreater than −0.5, but less 0.5 than or equal to 1.5 RS greater than1.5, but less 1.0 than or equal to 3 RS greater than 3, but less than1.5 or equal to 4 RS greater than 4 2.0


20. An apparatus comprising: storage medium having stored therein aplurality of programming instructions designed to enable said apparatusto determine a first collection rating for a first rating scale forcontents of a first document collection, determine a first link ratingfor said first rating scale for contents linked to or linked by contentsof said first document collection, and modify said first collectionrating for said first rating scale for contents of said first documentcollection based on said determined first link rating for said firstrating scale for contents linked to or linked by contents of said firstdocument collection; and at least one processor coupled to the storagemedium to execute the programming instructions.
 21. The apparatus ofclaim 20, wherein said programming instructions are designed to enablethe apparatus to perform said determining of a first collection ratingby determining said first collection rating based on document ratings ofa first subset of documents of said first collection of documents, andsizes of the documents of the first subset of documents of the firstdocument collection.
 22. The apparatus of claim 21, wherein said firstsubset of documents of said first document collection consists of firsttextual documents of said first document collection.
 23. The apparatusof claim 20, wherein said programming instructions are designed toenable the apparatus to perform said determining of a first link ratingby determining at least a second collection rating for at least a seconddocument collection with documents linked to or linked by documents ofsaid first document collection, and determining said first link ratingbased on said determined at least a second collection rating of said atleast a second document collection.
 24. The apparatus of claim 20,wherein said programming instructions are designed to enable theapparatus to perform said modifying of the first collection rating byreplacing the determined first collection rating with said determinedfirst link rating.
 25. The apparatus of claim 20, wherein saidprogramming instructions are designed to enable the apparatus to performsaid modifying of the first collection rating by adding said determinedfirst link rating to the determined first collection rating.
 26. Theapparatus of claim 20, wherein said programming instructions aredesigned to enable the apparatus to perform said modifying of the firstcollection rating by subtracting said determined first link rating fromthe determined first collection rating.
 27. The apparatus of claim 20,wherein said first document collection is a web site, and said contentsof said first document collection are web pages.
 28. An apparatuscomprising: storage medium having stored therein a plurality ofprogramming instructions designed to enable said apparatus to determinedocument ratings for a rating scale for a subset of documents of adocument collection, determine sizes of the documents of said subset,determine a collection rating for said rating scale for said documentcollection based on said determined document ratings of said subset ofdocuments, and normalized by said determined sizes of said subset ofdocuments; and at least one processor coupled to the storage medium toexecute the programming instructions.
 29. The apparatus of claim 28,wherein said programming instructions are designed to enable theapparatus to perform said determining of the collection rating byfurther subdividing said subset of documents into a plurality of groupsin accordance with their determined sizes, and applying a weight to thedocument rating determined for said rating scale for each document ofthe subset in accordance to the document's size group classification.30. The apparatus of claim 29, wherein said programming instructions aredesigned to enable the apparatus to apply weights to said determineddocument ratings for said rating scale as follows: Document size rangein (bytes) Weight <500  1 500-999  4 1000-4999  7 5000-9999 10 >9999  13


31. The apparatus of claim 28, wherein said programming instructions aredesigned to enable the apparatus to perform said determining of thecollection rating by further subdividing said subset of documents into aplurality of groups in accordance with their determined ratings for saidrating scale, and applying a weight to the document rating determinedfor said rating scale for each document of the subset in accordance tothe document's rate group classification.
 32. The apparatus of claim 31,wherein said programming instructions are designed to enable theapparatus to apply weights to said determined document ratings for saidrating scale as follows: Determined document rating for said ratingscale Weight 0 −0.5 1 0.5 2 3 3 6


33. The apparatus of claim 28, wherein said programming instructions aredesigned to enable the apparatus to perform said determining of thecollection rating by computing the collection rating for said ratingscale as follows:${CR} = \frac{\sum\limits_{i,j}{r_{i}w_{j}{\log \left( {N_{ij} + 1} \right)}}}{\sum\limits_{i,j}{w_{j}{\log \left( {N_{ij} + 1} \right)}}}$

where CR is the collection rating for said rating scale; r_(i) is theweight applied for document rating group i; w_(i) is the weight appliedfor document size group j; N_(ij) is the number of pages in thecollection with document rating i and having group sizes j for saidrating scale.
 34. The apparatus of claim 28, wherein said firstcollection of documents are web pages of a web site, and said firstsubset of documents are textual documents of said web site.
 35. Anapparatus comprising: storage medium having stored therein a pluralityof programming instructions designed to enable said apparatus todetermine whether a first document collection comprises at least onedocument linked to at least one other document of at least one othersecond document collection, determine a collection rating for a ratingscale for each of said at least one other second document collection ifsaid first document collection is determined to comprise at least onedocument linked to at least one other document of at least one othersecond document collection, determine whether said first documentcollection comprises at least one document being linked by at least oneother document of at least one other third document collection,determine a collection rating for said rating scale for each of said atleast one other third document collection if said first documentcollection is determined to comprise at least one document linked by atleast one other third document collection, and determine a link ratingfor said rating scale for said first document collection based on eithersaid determined collection rating or ratings for said rating scale forsaid at least one other second document collection, or said determinedcollection rating or ratings for said rating scale for said at least oneother third document collection, or both, depending on whethercollection rating or ratings are determined for said rating scale forsaid at least one other second document collection, said at least oneother third document collection or both; and at least one processorcoupled to the storage medium to execute the programming instructions.36. The apparatus of claim 35, wherein said programming instructions aredesigned to enable the apparatus to perform each of said determining ofa collection rating for said rating scale for each of said at least oneother second or third document collection by determining documentratings for said rating scale for documents of the particular documentcollection, and sizes of the documents, and determining the collectionrating for the particular document collection based on the determineddocument ratings and the determined sizes.
 37. The apparatus of claim35, wherein said programming instructions are designed to enable theapparatus to perform said determining of a link rating by summing saidcollection rating or ratings determined for said rating scale for saidat least one other second or third document collection, and determiningthe link rating based on the result of said summing.
 38. The apparatusof claim 37, wherein said programming instructions are designed toenable the apparatus to perform said determining of the link ratingbased on the result of said summing by determining the link rating basedon the result of said summing as follows: The result of said summing(RS) link rating RS less than −2 −1.0   RS greater than or equal to −2,−0.5   but less than −1 RS greater than or equal to −1, 0   but lessthan or equal to −0.5 RS greater than −0.5, but less 0.5 than or equalto 1.5 RS greater than 1.5, but less 1.0 than or equal to 3 RS greaterthan 3, but less than 1.5 or equal to 4 RS greater than 4 2.0